Springer Nature 2021 TeX template 


Dynamic Bayesian Updating and ‘Theoretical 
Validation under Uncertain State Dynamics 
and Payoff Functions 


Jiangjing Zhou!, Ovanes Petrosian?t and Hongwei Gao 


1 Applied Mathematics and Control Processes, St.Petersburg 
State University, 7/9, Universitetskaya nab., Saint-Petersburg, 
199034, Saint-Petersburg, Russia. 

School of Automation, Qingdao University, 308 Ningxia Road, 

Qingdao, 266071, Shandong Province, China; 

Applied Mathematics and Control Processes, St.Petersburg State 
University, 7/9, Universitetskaya nab., Saint-Petersburg, 199034, 
Saint-Petersburg, Russia. 
3"School of Mathematics and Statistics, Qingdao University, 308 
Ningxia Road, Qingdao, 266071, Shandong Province, China. 


*Corresponding author(s). E-mail(s): gaohongwei@qdu.edu.cn; 
Contributing authors: st092028@student.spbu.ru; 
petrosian.ovanes@yandex.ru; 

İThese authors contributed equally to this work. 


Abstract 


In this paper, we investigate pollution control using game theory, with a 
focus on uncertainties in state dynamics and payoff functions. We introduce 
a dynamic Bayesian updating method to address unpredictable variables like 
wildfire effects on pollution absorption and incomplete knowledge about oth- 
ers’ control costs. Our main contribution lies in the theorems we prove, which 
validate the accuracy and efficiency of our approach, particularly in improving 
uncertainty estimates. The effectiveness of our method is further demon- 
strated through simulation experiments. These experiments not only affirm the 
method’s validity but also contrast optimal control strategies under both deter- 
ministic and uncertain conditions, underscoring the practical applicability and 
robustness of our approach in real-world environmental management scenarios. 


Springer Nature 2021 TeX template 


2 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynan 


Keywords: Bayesian Updating, Uncertainty and Learning, Nash equilibrium 
with dynamic Bayesian updating, Hamilton-Jacobi-Bellman equation 


1 Introduction 


Game theory has emerged as a valuable tool in the analysis of multiplayer systems, primarily 
because of its rigorous mathematical framework for optimizing decision-making [1]. Dynamic 
games, in particular, have garnered increasing attention because they capture the essential 
aspect of players needing to consider how their payoffs evolve over time, as opposed to static, 
immediate costs for each action [2], [3], [4]. 

In the realm of dynamic games, the approach involves extending optimal control tech- 
niques from single-player scenarios to groups of players with both shared and conflicting 
interests. As demonstrated in [5], if a set of interconnected partial differential equations, 
often referred to as the Hamilton-Jacobi-Bellman (HJB) equations, has solutions, then 
achieving Nash equilibrium in the game is possible. In this equilibrium, no individual player 
can unilaterally alter its Nash control policy without experiencing a decrease in its own 
performance. 

A drawback of traditional dynamic game solutions is their assumption that all players 
possess complete knowledge about all aspects of the game they are engaged in. In many real- 
world applications, players operate in rapidly changing and uncertain environments, leaving 
them with incomplete information about the game [6], [7]. 

In this particular context, our primary focus lies on two dimensions of uncertainty 
within dynamic games. Firstly, there exists uncertainty concerning the unknown parame- 
ters present in the dynamic equations. We make the assumption that in a scenario where 
all players are rational, the prior beliefs of these players about the unknown parameters in 
the state equations are common. Secondly, uncertainty is associated with the payoff func- 
tions, specifically regarding the scenario where player j lacks complete information about 
the unknown parameters embedded within the payoff functions of other players. In this case, 
each player needs to provide their beliefs about the remaining players because they only 
possess knowledge of their own payoff function parameters. 

In the introductory sections of papers [8], [9], [10], the principal focus is dedicated to 
the examination of uncertainty within the dynamic equations. In [8], the research delves into 
the subject of ”learning in a dynamic game of international pollution, with ecological uncer- 
tainty.” Paper [9], on the other hand, delves into the impact of learning on future payoffs 
and underscores the presence of two distinct sources of risk: ”structural uncertainty” and 
uncertainty due to the anticipation of learning.” Notably, ” structural uncertainty” in this 
context arises from the uncertainty inherent in the state equations. Lastly, [10]’s introduction 
addresses how players engage in the process of learning pertaining to the stochastic dynam- 
ics governing the evolution of public capital, encompassing the influence of technological 
investment on future stock outcomes. 

In addressing the uncertainties present in dynamic equations, our approach leverages the 
method of dynamic Bayesian updating. This process involves players receiving signals about 
the environment and subsequently updating their beliefs about a particular uncertainty, 
such as the natural absorption rate. Importantly, this updating process is continuous and 
occurs each time new signals are received by the players. This contrasts with traditional 
Bayesian updating, which typically involves a single update. Moreover, we have established 
the convergence properties of the players’ estimates. These estimates, treated as random 
variables (considering the signals as random variables, thus making the estimates functions 
of these signals), demonstrate convergence not only in their values but also in the variance of 
these random variables. This aspect of convergence, including the convergence of variance, is 
a novel contribution not found in other literature. Our proof employs critical theorems such 
as Kolmogorov’s Convergence Theorem [11], providing a rigorous theoretical foundation for 


Springer Nature 2021 TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


our approach. This continuous updating mechanism and its proven convergence properties 
offer a significant advancement in dynamic game theory, particularly in contexts where 
players must constantly adapt to evolving environmental conditions. 

In contrast, there has been limited research conducted to address the uncertainty related 
to unknown parameters within the payoff functions that pertain to individual players. In 
the classical framework of Bayesian games [12], while the issue of uncertainties in payoff 
functions due to unknown parameters has been addressed, the players’ beliefs about these 
uncertainties are assumed to be static and unevolving. This overlooks a critical aspect: if 
players receive new information about uncertainties at each stage of the game, their beliefs 
about these uncertainties should consequently be updated and evolve. There is a noticeable 
gap in current research regarding this process of belief evolution, particularly in how it 
influences decision-making and strategic choices. 

To address the uncertainty in cost parameters of players within payoff functions, we 
employ a dynamic Bayesian updating method. In this model, players continually revise their 
uncertainties about unknown parameters based on incoming signals [13]. We also introduce 
a new method for defining a player’s expected payoffs, which is grounded in belief updates 
derived from dynamic Bayesian updating. This innovation addresses the limitation in classi- 
cal Bayesian games where players’ beliefs are fixed, a condition that can significantly impact 
outcomes if the initial beliefs are not accurate. A specific focus of our study is the Kalman 
filter [14], a variant of dynamic Bayesian updating. We provide a detailed proof of the con- 
vergence of players’ estimates using the Kalman filter, demonstrating its effectiveness in 
refining decision-making in dynamic games and multi-agent systems. 

In the following sections, we will outline the paper’s structure. Section 2 provides a 
mathematical foundation for our research, while Section 3 and Section 4 introduce the 
methodology for belief updates to address uncertainty in the motion equations and payoff 
functions, respectively. Moving forward, Section 5 analyzes the decision-making dynamics 
of two asymmetric players, shedding light on their interactions. In Section 6, we present the 
results of simulation experiments and conduct a thorough analysis. Finally, in Section 7, we 
summarize our findings and contributions in the conclusion. 


2 Modeling and Mathematical Description of the Prob- 
lem 


2.1 Pollution Control Game Fundamentals 


In this game-theoretic model of pollution control, inspired by the works [15] and [8], a group 
of n players, each representing a nation, is engaged in the complex task of managing their 
emissions. Each player, denoted as i, oversees their own emission levels, which fall within 
the range u; € [0, Dj], where 0 < D; < œœ. 


2.2 Modeling the State Equation 


We track the cumulative net emissions in the environment, denoted as S(t), at different time 
t. The dynamics of this stock variable are captured by a difference equation, incorporating 
an initial condition as follows: 


N 
S(t+1) =F (> uilt) + ss) , S(to) = So, (1) 


i=1 


where 0 < 6 < 1, such that 1 — 6 is the natural decay rate of pollution and 7 represents 
a random variable introducing ecological uncertainty into the model. This variable plays a 
pivotal role in determining the rate at which emissions affect the environment. It accounts 
for a multitude of factors that are typically underappreciated but essential in understanding 
the dynamics of pollution control. 


Springer Nature 2021 TeX template 


4 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynan 


The ecological uncertainty encapsulated by 7 is shaped by various influential factors: 

1. Ecological System Dynamics [16]: This uncertainty is inherently linked to the 
complex and ever-changing dynamics of natural systems. Environmental factors such 
as seasonal variations, weather conditions, and fluctuations in ecosystems can signif- 
icantly influence 7. These dynamics affect the rate at which emissions are absorbed, 
redistributed, or even accumulated. 

2. Policy and Regulatory Fluctuations [17]: Ecological uncertainty is further com- 
pounded by the volatility of pollution control policies and regulations. These can evolve 
at both national and international levels, and alterations in policies may lead to fluc- 
tuations in the efficacy and stringency of pollution control measures, directly affecting 
7. 

3. Natural and Ecological Events [18]: Sudden and unpredictable natural events, 
such as wildfires, floods, and disruptions in ecosystems, can cause rapid and substantial 
shifts in the environment. These events introduce profound uncertainty by impacting 
the capacity of ecosystems to absorb or mitigate emissions. 

Let x represent a specific realization of the random variable 7, characterized by a prob- 
ability distribution described by the function ¢(z|@). In this context, 6, an element of the 
parameter space © C R}, constitutes the vector of sufficient parameters for the probability 
density function (p.d.f.) ¢. The support for this p.d.f., denoted as H and contained within 
R, establishes the set in which x exists. 

The player’s beliefs concerning this unknown parameter are encapsulated by €(6). This 
conceptualization is particularly relevant in domains marked by uncertainty, such as envi- 
ronmental issues where this uncertainty could represent factors like the rate at which nature 
absorbs pollutants. 

At each time step t, players face the challenge of predicting the realization of 7 before 
making decisions. The mean (u) and variance (a?) of 7, which follows a normal distribution, 
are initially unknown to the players. 

However, players receive signals about 7 at each time step, allowing them to gradually 
form beliefs about the unknown mean and variance. Prior to observing xt, players rely on 
historical signals 71,...,24-1 to estimate x+. These estimations where players use their 
evolving beliefs about the unknown mean and variance of 7 to predict its value at the current 
time step t. The iterative nature of this method involves updating their beliefs each time a 
new signal (x+) is received. 


2.3 Modeling the Payoff Function 


Additionally, these dynamic games include payoff functions of player i € N of the form: 


De N 
Ki (u1,..-,Un; So) = max $ (u: (t, Ti, -)(ai -uilt m) -X wlt) rst), (2) 
t t=0 jżi 


where T’ represents the terminal time of the game, and a; > 0 is a constant which indicates 
the maximum amount of pollution emissions that player i can influence. Each player pos- 
sesses a private type 7; € T;, known exclusively to themselves, T = (T1, ..., Tn) signifies the 
combinations of types that players can possibly realize. It represents the resources, time, 
and costs that player have to bear when implementing pollution control measures. Player i 
must pay 7; to reduce their emissions or undertake other environmental protection actions. 
It significantly impacts player i’s decision-making process as it involves the trade-off between 
costs and benefits. 

In the context of the pollution control game, the uncertainty surrounding 7; arises 
because other players (e.g., player j # i) lack complete knowledge of the true value of 7;. 
This uncertainty can stem from several factors: 


Springer Nature 2021 TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


1. Information Asymmetry [19]: Other players lack full information about the actual 
value of 7;. Player i may choose not to disclose their exact control cost for pollution 
control. 

2. Measurement Errors [20]: In an emissions permit trading market, company j can 
use observed permit prices to estimate the costs of player i. However, measuring 
Ti accurately can be limited by practical constraints, leading to potential errors in 
estimation. This means that although j may observe higher permit prices, it might 
not accurately determine is exact cost parameter Ti, emphasizing the impact of 
measurement errors on cost estimation. 

3. Strategic Actions [21]: Player i may take strategic actions to obfuscate other players’ 
understanding of 73, which could be driven by a lack of trust in player j. In such 
cases, player i might intentionally choose to obfuscate information, leading to increased 
uncertainty in the game. 

Player i receives a payoff that depends not only on all players’ actions u; € Uj, as in 
complete information games, but also on the realized private type of player i. These games 
feature incomplete information, as each player must choose its strategy without knowledge 
of the realized types of other players. 

Next, we provide a definition for the game described above: 


Definition 1 A dynamic game with Bayesian updating for uncertainties in motion 
equations and payoff functions is defined as a tuple (N, S,U,0,T, €, p, K), where: 


e N is the set of players in the game. 

e S is the set of reachable states. 

U =U, x U2 x... X Un with U; being the set of admissible controllers for player i. 

0 represents the sufficient statistic for the random variable 7. 

T=T, xT X... X Tn represents uncertainty in the payoff function, with t; being the 

type of player i. 

e The common prior € describes prior beliefs about 0. Players only observe the actual value 
of x at each period, and each player uses this information to update their beliefs about 0. 

e The common prior over types p : T — [0,1] describes the probability of finding every 
player i in type Ti € Ty. 

¢ The performance indices KẸ? represent the costs for each player associated with a given 
control policy in a state value and a particular combination of types. 


To define relevant equilibrium concepts in incomplete information games, players cannot 
directly utilize the payoff functions (2) because they lack knowledge of the types of the 
other players. Therefore, we will now introduce the concept of expected payoff. Given the 
definition of the payoff function (2), player i controls actions influenced by their type 7; and 
the strategies of other players. 

The classic ex-interim expected cost of player i is determined when 7 has knowledge 
of its own type, but remains unaware of the types of all other players. It’s important to 
note that this scenario arises when players calculate their expected costs after the game has 
already commenced. In the context of a Bayesian game where players engage with policies 
ui and player i possesses type Ti, the classic ex-interim expected payoff [22], [23] can be 
alternatively expressed as follows: 


Ky (u1, Un; So) = 


Te N (3) 
max», XO |pClri) | uilt, Ti) (ai — uilt, ti) — X ult, r) | — TiS, 


t=0rET jżi 


Springer Nature 2021 ETRX template 


6 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynan 


where p(t|7;) represents the probability of having a global type given that player i has type 
Ti, and the summation index 7 represents all possible combinations of types in the game. 

In the classical definition of the expected payoff function, we assumed a known Bayesian 
game model where player types and their associated probability distributions were fixed. 
However, in practical decision-making scenarios, uncertainty about other players’ types 
is common and can be updated in real-time based on received signals. To capture this 
dynamism and uncertainty, we introduce an innovative approach that incorporates signals 
and belief updates into the definition of the expected payoff function. This dynamic approach 
better represents real-world decision-making complexities and departs from traditional 
assumptions of fixed player type probabilities in Bayesian games. 


EK?¥(u1,.-.,tn3 S0) = 


TH N 
4 
max D [oillt D) | wits n)a- wilt, 74) — Do tn) | -rse , (4) 
t=0 Tj ET}, pez 
i+i 


where: 


© pt(rjlyj(t— 1)): Player ïs belief at time t about the type 7; of player j based on the 
signal y;(t — 1) received in the previous time step t — 1. 

© ui(t, 7i,-): The decision made by player i at time t, which depends on their type 7;. 

° DA uj(t, Tj, +): The decisions made by other players j at time t, each dependent on their 
respective types Tj. 

è 7;: The cost coefficient of player i. 


The newly defined expected payoff function introduces belief updates based on signals. The 
core of this innovation lies in dynamic belief updates, where over time, an player’s beliefs 
change as they receive signals. Unlike traditional probability distributions p(r|7;), this 
method is more suitable for considering the dynamic evolution of beliefs over time and 
the dynamic responses of players to unknown parameters. This introduces more com- 
plex dynamic elements into the game model, making it more aligned with real-world 
decision-making and strategy formulation scenarios. 


3 Belief Updating for Uncertainty in Motion Equation 


In this chapter, we will explore how the players handle belief updates in the presence of 
uncertainty within the state equation. Uncertainty can arise from unknown parameters,and 
it’s crucial to understand how to effectively update our beliefs. We will begin by introducing 
the conditional expectation method for predicting the realization of random variable at time 
t, z(t). Subsequently, we will delve into how the players integrates new signals when they 
arise, to improve the estimation process. 

In the final part of this chapter, we will provide a detailed explanation of how the player 
updates their beliefs about unknown parameters after receiving signals. We will demon- 
strate the convergence of this iterative process, signifying that over time, the player’s beliefs 
regarding unknown parameters will gradually approach the true values. This is a significant 
theoretical outcome, with crucial implications for making accurate decisions in uncertain 
environments. 


3.1 Conditional Expectation Approach for x, Prediction 


In the context of learning the mean and precision of a normal distribution, we start with 
a random variable 7 that follows a normal distribution. The distribution parameters, 6/ = 


Springer Nature 2021 TeX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


(41,07), are unknown, and at each stage, denoted as t where t can take on values from 0 to 
T’, we have prior beliefs represented as €(6'). 

In Bayesian estimation, we work with the precision of the Gaussian distribution, which is 
the reciprocal of the variance. In other words, precision is defined as \ = 5, and it is always 
a positive value. So, high precision indicates low variance, while low precision indicates high 
variance. 

When performing Bayesian estimation for the parameters of interest, y and A are used, 
which can be combined into a single parameter vector 8 = (u, A) to describe the distribution. 
In this context, 4 represents the mean of the normal distribution, and A (which is the 
precision) characterizes the inverse of the variance. 

After applying Bayesian updating in each stage, we obtain estimated values g, and A+ at 
each stage. Next, we will explain how to utilize the obtained Bayesian estimates to predict 
the value of 7 at stage t based on the conditional mathematical expectation 


Tt = E(7ml|@:) = E(B At), t=0,1,2,...,7", (5) 


where ji, and Az represent the Bayesian prior estimates of the unknown parameters u and 
A at stage t. 

In order to facilitate readers and avoid symbol confusion in this article, we provide the 
following table of symbol definitions. In Table 1, you will find the definitions of the key 
symbols and their respective meanings: 


Table 1: Key Notation Definitions 


© Sample space of the unknown parameters 
0 Distribution parameter (random variable) 
64 Prior estimator of 0 at stage t 

A Posterior estimator of 0 at stage t 


€:(0) Prior belief of 6 at stage t 
&,(9) Posterior belief of @ at stage t 
Xt Set of signals at stage t 

Xt Realization of signals at stage t 


Following the discussion on predicting the realization of 7 at time t using Bayesian 
estimation, we will now elaborate on the process of obtaining Bayesian estimates based on 
received signals. Additionally, we will introduce critical assumptions that enable us to extend 
our Bayesian updates to dynamic updating, meaning that the updating process occurs not 
just once but whenever the players receive new signals. 


3.2 Incorporating New Signals for Estimation 


In this context, because both the mean and variance of 7 are simultaneously unknown, 
we cannot directly derive estimates for the unknown mean and variance based on belief 
distributions regarding mean and variance separately. This is because players’ beliefs about 
the mean are contingent upon their beliefs about the variance. 

Therefore, we employ the approach of joint probability distribution to first obtain the 
joint probability distribution of u and à. Subsequently, we utilize this joint probability 
distribution to compute marginal distributions for each variable. This enables us to derive 
separate beliefs about the mean and precision. 

In stage t = 0,1, 2, ..., T’, &(0) represents the prior joint distribution of both u and A 
parameters, which can be defined as follows: 


def 


61(0) N (u | we, (A)T) Ga (A | ar, rate = fr), (6) 


Springer Nature 2021 ETRX template 


8 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynan 


where ut represents the prior estimate or belief about the mean parameter u at stage t, Kt 
is a parameter related to the precision of the normal distribution for u. Higher values of «Kt 
indicate higher precision, a; represents the shape parameter of the gamma distribution for 
the precision parameter A, and 6z is the rate parameter of the gamma distribution for the 
precision parameter À. 

The key concept here is to treat the unknown mean and precision as correlated variables, 
as the estimation of the mean depends on the estimation of precision, and vice versa. By 
obtaining the joint probability distribution, we can simultaneously consider the relationship 
between them and then separate them in subsequent calculations to obtain independent 
belief distributions about the mean and precision. This approach provides a more accurate 
understanding and estimation of these two unknown parameters. Now that we have obtained 
the joint distribution of u and A, the next step is to derive players’ beliefs regarding u and 
A separately. 

The marginal probability distributions for u and A can be obtained from the joint 
distribution as follows: 


Elu) = i T Eelit, NAD, 


E&A) = i ° eelu, da, (7) 


t=0,1,2,...,7", 


where &;(j) represents the belief of the players regarding the parameter p after considering 
both u and A and (A) represents the belief of the players regarding the parameter A after 
considering both pz and A. 

Subsequently, we will delve into the information accessible to the players and elucidate 
their learning process. While the players are unaware of the precise value of 0* or its distri- 
bution for 7, they share common prior beliefs regarding its value, which are represented as 
a normal-gamma prior probability density function (p.d.f.) on the parameter space O. 

To establish the likelihood function for the unknown parameters u and A, assume that 
when players observe the signal, it conforms to a normal distribution with a mean of u and 
precision A. This can be expressed as: 


A 
glær |m A) = ga exp (-3 eye n?) l 


In simpler terms, this equation describes how the players’ observed signals are dis- 
tributed, with u representing the center or average value of the distribution, and A indicating 
the precision or how tightly clustered the signals are around pw. This information is crucial 
for the players’ Bayesian learning process, allowing them to update their beliefs about 0* 
based on the signals they observe. 

After obtaining the prior joint probability distribution and the likelihood distribution, 
we will present Bayesian inference methods to use newly acquired data for updating our 
beliefs about 6. 


Proposition 1 After observing the signal x1 € Xt, t = 0,1,2,...,T", we can com- 
pute the posterior distribution of the parameter 0 using Bayesian inference. This posterior 
distribution can be represented as: 


` Ktht + Tt i , 1 , Ke (at — m)? 
EO xN (n p pi tN ) x Ga (àla ; 37 Pt "Ke +1) J 


which implies that, following the observation of signal xt, our beliefs about the parameter 0 
transition from the prior distribution &;(0) to the posterior distribution (0). 


Springer Nature 2021 TX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


Proof After observing the signal x4, t = 0,1,2,..., T’, the posterior distribution of 0 is 
computed by multiplying the prior with the likelihood function: 


E (0) x (we | u, AE (8) 
se Bo ew (RtA(H—Ht)?)/2 04-1 o- Ber x y1/2 6-3 (2t-u)? 
oc | NE ACHHI/2-1¢- BAG (A/2) [Ke (u-u)? H(@e—H)?7] 
It can be shown that 


Ke (p = pe)? + (we — u)? = re (u — pe)? + (uw a)? 


2 
Kt (xt ST Ht) 
=(Ke+1 24 f 
(kt +1) (u — ut+1) EST 


where pt41 = ete 


Therefore, we can get 


E,(0) ocd ben A/2) (6 +D (wwe)? x yor +1/2-1,—-BtrA_— A/2) 


Kept + Tt ci <ul _ Ke (wt — pt)? 
oN (p Ee »((Kt + 1)A) ) xaa (alat Sa Dee) j 


nelzt- ut)? 
REF 


Here, we introduce how Bayesian updates occur in a single step, where players use 
the signals received at time t, denoted as x, to update their beliefs about the unknown 
parameters js and A from the prior distribution €;(9) to the posterior distribution £ (0). 

The next question is how to apply the posterior beliefs obtained by players at time t to 
the next stage, t+ 1. We propose an important assumption: 


Assumption 1 At thet stage, the posterior belief for an unknown parameter 0 is identical 
with the prior distribution at thet +1 stage. 


In other words, we assume that the updated beliefs at stage t become the prior beliefs 
at the subsequent stage t+ 1. This assumption allows for a continuous updating process 
as players receive new information at each stage, maintaining a consistent flow of Bayesian 
updates. 

This assumption forms a crucial part of the Bayesian updating process, enabling 
the smooth transition of beliefs from one stage to the next as players dynamically and 
consistently receive new information. 


Proposition 2 The successive belief update process can be described by the following set 
of equations, t = 0,1,2,...,T”, 


ae _ Rely + Lt 
Ht+1 Real 
Re4i = Ke +1, 
Qt+1 = Qt E (8) 
2° 
B S Z Kt (£t — T)’ 
t+1 t 2(Ft a 1) , 


where initial belief represented by uo, Ko, &0, and Bo and iteratively refine these beliefs based 
on new observations and accumulated information. This process ensures a dynamic and 
consistent updating of beliefs over time. 


Springer Nature 2021 TeX template 


10 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


Proof The derivation of this iterative formula is straightforward when Assumption 1 is 
considered. Since the posterior belief at stage t + 1 is assumed to be identical to the prior 
distribution at stage t, the iterative update equation follows naturally. 


It implies that players do not need to start afresh with each new stage by redefining 
their prior beliefs and recalculating the posterior distribution for the unknown parameters. 
Instead, they can seamlessly define their prior beliefs at each stage based on the outcome of 
the previous stage, allowing for a dynamic flow of Bayesian updates. 

In practical terms, this means that as new information becomes available in successive 
stages, players can simply update their belief parameters using this formula, making the 
process more efficient and manageable. It ensures a smooth and continuous learning process 
without the need for a complete reinitialization at each stage. 

Suppose that Mz is a random variable associated with 7,, and we will now demonstrate 
that as t approaches infinity, M+ will converge towards the mean p of 7. This convergence 
signifies that as different realizations of observations occur, M+ will gradually approach the 
true unknown mean p. This means that regardless of the specific values of observations, our 
estimate M+ will stabilize over the long term, converging towards p. 

In the following two proofs, we assume t > 1 because the belief at t = 0 is given. This 
assumption is grounded on the premise that the initial state of beliefs is predetermined or 
known at the outset of our analysis. By focusing on t > 1, we direct our attention to the 
evolution of beliefs as they dynamically update over time, subsequent to the initial state. 
This approach enables us to delve into how beliefs adapt and transform in response to 
incoming information or signals, pivotal to our study’s aim of elucidating belief evolution in 
dynamic Bayesian contexts. 


Theorem 3 (Strong convergence) The estimator of the unknown mean will tend to the 
real value, which means 

lim Mi = p, 

too 
where M+ is the random variable describing the related belief m, at stage t > 1; it is the 
belief of the unknown mean u at stage t. 


Proof The proof is given in A. 


Therefore, the convergence of M; is crucial in emphasizing the stability and accuracy of 
our estimates in different scenarios. It provides us with a consistent estimation of unknown 
parameters, such that this estimate tends to the true value, regardless of the observation 
conditions. 


Proposition 4 (Kolmogorov’s Convergence Theorem) [24] Suppose that Xo, X1, X2,... 
are independent random variables, if Xp o D(Xn) < œ, then E o (Xn — E(Xn)) 
converges a.s. 

In particular, 


co co co 
5 D(Xn) <œ and 5 E(Xn) converges > 5 Xn converges a.s. 
n=0 n=0 n=0 


Theorem 5 The variance of the estimator of the unknown mean almost surely tends to 


0. In other words, 
B 
lim — = 0, (9) 
too Kt (Gt _ 1) 


Springer Nature 2021 4TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


where is a random variable representing the variance of the estimator of the 


B+ 
kt(at—1) 
unknown mean at the stage t > 1, whereas Rt, Œt are the beliefs at the stage t. 


Proof The proof is given in B. 


4 Belief Updating for Uncertainty in Payoff Functions 


In the context of player i’s payoff function, the parameter 7,* is known only to player 7 and 
is unknown to all other players, denoted as j where j # i. Consequently, for other players, Ti 
is treated as an unknown variable. We can consider all possible values of 7; as the potential 
types of player i and define T; as the set of possible types for player i. This set is common 
knowledge among all players. However, when player j maximizes their own payoff function, 
it depends on the strategies of all other remaining players, and each player’s strategy is 
dependent on their own type. 


4.1 Application of the Kalman Filter Method 


The Kalman Filter is a specialized form of Bayesian updating [25]. It’s an algorithm that 
provides efficient computational means to estimate the unknown parameters, it is a spe- 
cialized form of Bayesian updating. It’s an algorithm that provides efficient computational 
means to estimate the state of a process. 

In this context, we assume that the cost of pollution abatement 7; for player i remains 
constant [3] [15] and is unknown to other players. 

Since 7; is a constant known to all players, player j can assume that the value of 7; 
remains the same at both time steps t and t+ 1, and it can be directly defined as 7;, which 
can be expressed as follows 


Tilt +1) =7i(t) =7i, t=0,1,2,...,7". (10) 

The market permit prices (y;) of player i observed by player j may be subject to ran- 
domness, which can represent various unknown factors such as market fluctuations, policy 
changes, technological advancements, and more. 

The challenge arises when player j does not know the exact type of player i. This 
signal is a perturbed value of 7;. The signal y;(¢) could be derived from sources like market 
data or environmental measurements, which often have inherent uncertainties. By modeling 
the signal in this manner, we acknowledge the challenges associated with making accurate 
estimates in the presence of uncertainty. 

We further assume that the observations of 7; are obtained through a emissions permit 
trading market that can be represented by a linear equation of the form: 


yilt) = T7; + w(t), t= 0,1,2,..., T, (11) 
where y;(t) represents the signal received by the remaining players j 4 i about the unknown 
parameter 7; and w;(t) is additive measurement noise. 

We make the following assumptions: 
1. The measurement noise random process w;(t),i € N is a zero-mean white noise process 
with known variance R;. Then, 


Ri, k=l 


E[wi(k)wi())] = { 0, otherwise. MA) 


Assuming that the variance of the disturbance is known indicates that we are aware 
of the magnitude of this randomness. This information helps us understand the extent 


Springer Nature 2021 BTẸX template 


12 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


of market price fluctuations and the potential uncertainty when estimating player it’s 
costs. If the variance is large, it signifies significant market price volatility, leading to a 
higher degree of uncertainty in player j’s estimation of player it’s costs. This assumption 
enables us to precisely account for the impact of such uncertainty in the model. 
2. The initial belief of player j about 7; is a, and variance P? = Ej(7;? — Ti)” are given. 
The primary concern when dealing with linear Gaussian models lies in the realm of 
learning or system identification. This entails the challenge of determining the parameters 
Oi = Ti that maximize the likelihood of a given observed sequence (or potentially multiple 
sequences) of outputs {y;(0),..., yi(t)}, t = 0,1, 2, ..., T”. 


4.2 Derivation of r; Estimate Values 


Below, we present the results of the Kalman filter, demonstrating that given player j’s initial 
beliefs about 7;, denoted as TP , the initial variance PP? , and the variance of the signals R, 
we will subsequently derive the predictions and updates for the player at times t > 0. 

1. **Prediction Step at time t + 1**: 

- Predicted estimate of the state (prior belief): 


Ti(t+l =7i(t), t20, 
where 7;(¢ + 1) represents player j’s prior belief regarding 7; at time t + 1, whereas 7;(t) 
denotes player j’s posterior belief at time t, which aligns with the Assumption 1. And 
0 
;(0) = 7? 4 pie (yi(0) — 72), which can be derived from Eq. (15). 


i 


- Predicted variance (prior variance): 


P(t+1\t)=Pi(t), t20, (13) 
where the left side of the equation (13) represents the prior estimate variance at time t + 1, 
while the right side corresponds to the posterior estimate variance at time t. And P; (0) = P}. 
2. **Update Step at time t + 1**: 
- Calculation of the Kalman gain: 


Pt+it) _  Pilt) 
P(t+1t)+ Ri Pi(t)+ Ri 
- Updated estimate of the state (posterior belief): 


(14) 


F(t +1) =Fi(t+1)+ Ki(t4+1)- (yit +1) -Tit +1)) 


P,(t) a (15) 
Pie) 4R; - (uilt + 1) — F(t). 


- Updated variance (posterior variance): 


P;(t)Ri 


P;i(t+1) = (1 — Ki(t+1))- P(t + 1]t) = Bi) + 


(16) 


These equations describe the prediction and update steps of the Kalman filter model, 
where 7;(t + 1) represents the posterior belief at time step t + 1. These beliefs are based on 
the observed data y;(t + 1), the prior belief 7;(t + 1), and the prior variance P;(t + 1|¢). 

By utilizing these equations, the Kalman filter model incrementally estimates and cor- 
rects the unknown variable 7; based on the observed data y;(t) and the system’s state 
equation 7;(t + 1) = 7;(t) = 7. The objective is to obtain the optimal state estimate, even 
in the presence of noise, to achieve the most accurate state estimation possible. 


Springer Nature 2021 TeX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


Theorem 6 For the Kalman filter constructed using Eq.(10) and Eq.(11), we have that 
lim P;(t+1t)=0, Vie N, 
too 


where P;(t + 1|t) represents the uncertainty in prior belief Ti(t + 1) at time t+1 based on 
information available at time t. 


Proof The proof is given in C. Oo 


This property is valuable in various fields, such as control systems, where we aim to track 
and predict the state of dynamic systems. As time goes on and more observations accumu- 
late, our predictions become extremely accurate, regardless of the inherent uncertainty (R) 
associated with our measurements. 

In essence, this result highlights the power and efficiency of iterative estimation methods 
like the Kalman filter in continually improving the precision of our predictions, making it a 
fundamental tool in various applications, from navigation to signal processing and beyond. 

In the context of introducing the gradual convergence of the estimated variance towards 
zero, this theorem indirectly implies that our estimates will, over time, cease to exhibit 
variations and gradually approach a state of stability. However, the critical question that 
remains is whether this stable value aligns with the true value of the unknown parameter, 
Ti. The answer to this question will be provided in the subsequent theorem. 


Theorem 7 Given the state equation of unknown parameter 7; as described in Equation 
(10) and the observed values as defined in Equation (11), we can apply the Kalman filter 
method to obtain the prior belief of Ti at time t + 1, t > 0, denoted as 7;(t+1), which 
corresponds to the updated value 7;(t) at time t. 

For random observations (where Y;(0), Yi(1),..., Yi(t — 1),t > 1 are random variables), 
the random variable Ê, (t), represents the corresponding random posterior belief of T; at time 
t, converges to the unknown true value T*, i.e., 


jim Ti(t) =X, Vie N. (17) 


Specifically, as time progresses, the estimated value 7;(t) for deterministic observations 
yi(O), yi(1),-.., ys(t — 1) will tend towards the true, unknown value 7. This convergence 
demonstrates the effectiveness of the Kalman filter in refining the estimation of unknown 
parameters and ensuring that the estimates progressively approach the actual values they 
represent. 


Proof The proof is given in D. 


In the context of discussing how a player, having received a signal that updates their 
private information about other participants, we proceed by introducing the methodology for 
incorporating these updated beliefs into a game-theoretical model for better comprehension. 


5 Analysis of Decision-Making Process of Two Asym- 
metric Players 


5.1 Nash Equilibrium with Dynamic Bayesian Updating of Players 


Formally, player it’s best response to control policies w_; is given by: 


Springer Nature 2021 PTẸX template 


14 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


BR,(u_:) = arg max EK;' (ui u—i; So). 
ui 


It is said that a Nash equilibrium with dynamic Bayesian updating is reached in the game 
if all players play their best response against each other. Nash equilibrium with dynamic 
Bayesian updating is the most important solution concept in Bayesian games. Definition 2 
formalizes this idea: 


Definition 2 A Nash equilibrium with dynamic Bayesian updating in the game is a strat- 
egy profile (už, už) such that for each player i = 1,2, each type Ti E€ T; and any alternative 
strategy ul, 

EK; (už, u*;; So) > BK;*(uj,u* 


i Wi Ui -i 


So), Vu; € Ui, Yri € Ti, Vi = 1,2. 


In other words, no player can increase their expected payoff by unilaterally deviating 
from the chosen strategy profile. 


5.2 Solving Game Equilibrium 


We are discussing the situation of asymmetric information between the players, meaning 
that player 1 knows all the situations of player 2, including the parameter b2 in player 2’s 
payoff function, but player 2 is unaware of the parameter 7, in player 1’s payoff function. 
Therefore, in this game, player 2’s strategy is based on his belief about 71, while player 1’s 
strategy is targeted at each type of player 1 at each time t. 

As mentioned above, according to the setting of asymmetric information between the 
players, we can write the payoff functions of player 1 and player 2 as follows: 


T” 
Kı(uı(t, 71,°), u2(t,:)) = a Ta eD [u1 (t, 71,-)(a1 — u1 (t, T1, :) — ua(t,-)) — T1 St], (18) 
1T) $= 


where, u1 (t, 71, -) is the action specified by the strategy u1 for player 1 of type Tı (which is an 
action available to him as a player of type 71). For player 1, in the game, any value is known 
to him without uncertainty. Therefore, for player 1, his payoff function is a deterministic 
value rather than a mathematical expectation for an unknown parameter. 


EKo(ur(t, Tl, -), u2(t, -)) = 


= (19) 
max XO XO [(ua(t,-)(a2 — ua(t,-) — ua(t,71,-)) — beSt) - p$ (Tilu (t — 1)] , 


ua(t,) t=0 7, ET, 


where player 2’s strategy is optimal given his beliefs and given the signals y1. p$ (Tı |yı(t— 1)) 
is the belief of player 2 about the unknown parameter 7; based on the signal in the previous 
stage. 

After having delineated the game-theoretic model, we now proceed to discuss the Nash 
equilibrium within this framework, characterized by dynamic Bayesian updating. This seg- 
ment focuses on how agents in the game revise their strategies and beliefs over time, guided 
by Bayesian principles. This equilibrium reflects a state where, given the updated beliefs and 
information, no player has an incentive to unilaterally deviate from their chosen strategy. 


Springer Nature 2021 TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


Proposition 8 Consider a Nash equilibrium with dynamic Bayesian updating of asym- 
metric two players. The optimal control strategies for the players are given by: 


2 1 t)r HE t)z 
ui (t, 71; Ti (t), Tt) = a1 a2 + cx JEt a + eil Ete (4) on{t) me bo, 
3 3 2 6 3 (20) 
= _ 2 1 C1 (t)Tt n 2c1 (t)z 
ua (t; t), Tt) = —a a t) + —— bs, 
a(t; T1(t), T+) = 702 — 701 3 Tilt) 3 
= T’-t 

where the coefficient c1(t) is defined as cı (t) = — a , which is less than zero, and 


t=0,1,2,...,7’. Here, T? represents the initial belief of player 2 with respect to tı. The 
constants a1, a2, b2, and 6 satisfy the following inequalities to ensure the non-negativity of 
the corresponding control: 


> + dr? < a1 < 2a2 + dr? — 2dbo, 


3 
bg < — a2, 
T 


= « a 
ihere-d= Ilos), 
—uo'ð 


Proof The proof is given in E. 


Theorem 9 (Convergence of Nash equilibrium with dynamic Bayesian updat- 
ing) In the Nash equilibrium of a dynamic game with two asymmetric players 
employing Bayesian updating, the Nash equilibrium with dynamic Bayesian updating 
uï (t, Ti; T1(t), Tt) and u3 (t; T1(t), T+) converge to their respective steady-state values 
as t tends to infinity. 


Proof The proof is given in F. 


This theorem encapsulates the concept that in a dynamic game with Bayesian updating, 
as players’ individual estimations of key parameters— (ecological uncertainty) and 71 (cost 
parameter )—independently converge to their true values, their combined effect leads to the 
convergence of the players’ control strategies. 

Essentially, while the convergence of x; and Tı happens separately, the control strategies 
depend on both of these estimations simultaneously. The theorem signifies that as players 
gain more accurate and aligned insights about ecological uncertainty and cost parameters 
over time, this dual convergence in their understanding naturally leads to a stabilization of 
their control strategies. 


5.3 Impact of Unknown Parameters 


In summary, the proposed method addresses the challenges of estimating uncertainty in both 
the state equation and the payoff function. The process is illustrated in Fig. 1, depicting the 
following steps: 


Springer Nature 2021 TX template 


16 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


Fig. 1: Decision-Making Process in the Presence of Uncertainty 


In summary, the proposed method addresses the challenges of estimating uncertainty in 
both the state equation and the payment function. The process is illustrated in Figure 1, 
depicting the following steps: 

Start at Time t = 0,1, 2,...: 

1. Select Control Strategy: 


e Make a control or strategy decision based on prior beliefs. 

e Players make decisions relying on their prior beliefs about the unknown parameters. This 
involves u(t, Ti, Z(t), {T4 (t)}jen\i), where Z(t) represents all players’ prior estimates of 
uncertainty in the state equation, and 7;(t) signifies player i’s estimates of the unknown 
cost parameters of the other players at time t. 


2. Observe Signal: 


è Receive a signal, which may include environmental data x(t) or market information y;(t), 
at time t. 


3. Estimate Uncertain Parameters: 


e Utilize the Kalman filter or an alternative estimation method to update beliefs regarding 
uncertain parameters based on the received signal. This leads to the posterior belief about 
uncertainty at time t. 


4. Update Beliefs: 


e Update beliefs about uncertain parameters through Bayesian methods. This process com- 
bines prior beliefs with the newly estimated values, treating them as prior beliefs for the 
next stage, t + 1. 


Springer Nature 2021 TeX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


This structured approach guides the decision-making process at each time step, ensuring 
players iteratively adapt their strategies while considering the evolving information about 
uncertain parameters. 


6 Simulation Experiments and Results Analysis 


6.1 Experimental Setup and Parameters 


In the asymmetric player scenario, player 1 possesses private knowledge of their true cost, 
with rï = 2, while player 2’s cost is bg = 1.87, aj = 4.53, ag = 4 known to all as public 
information. Additionally, player 2 remains unaware of player 1’s type. We provide a range 
of comparative results to shed light on different aspects of this scenario. And the considered 
time interval spans from 0 to 100. 

In our study, we begin with an analysis of Fig. 2, which comprehensively demonstrates 
the nature and evolution of these signals, providing insights into how players perceive and 
react to varying degrees of ecological uncertainty. 

Moving from the reception of these signals, the study progresses to Fig. 3. Here, we 
explore how players use the received signals to estimate unknown parameters, crucial for 
their strategic planning. This figure reveals the collective effort of the players to converge 
their estimates towards the true value of ecological uncertainty and also the cost uncertainty, 
highlighting a process of learning and adaptation to the game’s dynamics. 

Fig. 4 then examines the variance in players’ estimations. It displays a clear trend of 
decreasing variance over time, signaling an increase in the accuracy of players’ assessments 
regarding ecological uncertainty and cost parameters. This trend is indicative of the players 
reaching a consensus, reflecting their enhanced understanding and collective decision-making 
process in the game. 

The narrative then moves to Fig. 5, which offers a comparative analysis of different 
scenarios in the game. This figure contrasts situations where player 1’s type is common 
knowledge with those where it remains undisclosed, examining how these different levels of 
information affect the stability of pollution control strategies. 

Fig. 6 continues the exploration of strategic choices, comparing the strategies for different 
types of player 1. This figure particularly looks at how the chosen pollution control level 
changes according to player 1’s type, shedding light on the impact of player types on pollution 
control decisions. 

Lastly, Fig. 7 presents a contrast of player 2’s optimal control strategies under varying 
levels of information about player 1’s type. This figure highlights the significant impact of 
information availability on player 2’s decision-making process and how their strategies adapt 
in the face of uncertainties about player 1’s type. 

Through this structured narrative, we journey from the initial reception of ecological 
uncertainty signals to the evolution of players’ strategic responses, offering a comprehensive 
view of the decision-making dynamics in pollution control games. 


6.2 Analysis of Experimental Results 


Fig. 2 marks the beginning of our journey into the intricate world of strategic decision-making 
under uncertainty. In this game, players are constantly faced with evolving environmental 
conditions and must adapt their strategies accordingly. The signals captured in this figure are 
not mere data points; they represent the complex interplay between ecological uncertainty 
and strategic response. By analyzing these signals, we gain valuable insights into how players 
perceive, interpret, and react to the ecological challenges presented to them, setting the 
foundation for the subsequent analysis and discussions. 


Springer Nature 2021 PTẸX template 


18 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


08 31 


0.6 


0.2- 


20 40 60 80 100 


(a) Evolution of Ecological Uncertainty Signals (b) Dynamics of Signals: Player 2’s Reception of 
Player 1’s Control Cost 


Fig. 2: Signal Analysis in Pollution Control Games 


Fig. 2 illustrates two figures, each providing insights into the dynamics of pollution 
control games. Figure 2a illustrates the actual observations of ecological uncertainty received 
by players at each moment. These observations are crucial in the context of pollution control, 
as they represent real-time changes in environmental conditions, such as fluctuations in 
pollution levels or the immediate responses of ecosystems. This data is vital for formulating 
flexible and adaptive pollution control strategies. Meanwhile, Figure 2b shows the noisy 
signals about the cost parameter Tı of player 1 received by player 2. This indicates that 
player 2’s estimates of player 1’s pollution reduction costs may include errors or uncertainties, 
making this incomplete information critical in formulating their pollution control strategies. 


0.75 


— Estimation of Ecological Uncertainty PiE e Estimation of Cost Parameter 
m Raett 
0.70 — True Mean meres e True Cost 
= e. p oe 
oF 
0.65) e 22+ = 
. a 
i ry k eeo, "re 
2.0 Arant ea aT al 
18 F 
I ra i re < ry $ rs n s $ ra ` s i re n ri į 
20 40 60 80 100 


(a) Comparison of Ecological Uncertainty Estima-(b) Comparison of Player 2’s Estimation of Player 
tion in Motion Equation 1’s Cost Parameter 

Fig. 3: Analysis of Dynamic System Parameter Estimation and Truth Value Com- 
parison 


In Fig. 3a, we observe that the player’s understanding of environmental impacts steadily 
aligns with the actual scenario. This indicates that over time, the player is improving their 
grasp of the ecological challenges faced in pollution control. The gradual alignment with 
reality may represent the player’s ongoing learning and adaptation to environmental changes. 

In Fig. 3b, it is notable that the player 2’s assessment of the player 1’s pollution man- 
agement costs progressively matches the real costs. This convergence demonstrates the 


Springer Nature 2021 4TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


effectiveness of our approach in accurately determining the actual costs of pollution control 
strategies, despite the player 2 starting with different initial assumptions. It shows how, over 
time, the player 2’s perception increasingly correlates with the true cost of the player 1’s 
pollution control measures, underscoring the importance of our method. 

Within the broader context of the pollution control game, the analysis of these two 
figures provides insights into the dynamic process of updating beliefs and strategies in 
response to evolving environmental conditions. The players’ increasingly precise evaluations 
reflect the learning and coordination processes within the game, crucial for effective pollution 
management. 


0.10 
0.03 ° 0.08 
. 
. 
è 0.06 
0.02 
; d 
. 0.04 
0.01 ae 
Me 0.02 
L NN etete earo eo sono ons onno ree ereo conoemn ete | | | A 
20 40 60 80 100 20 40 60 80 100 


(a) Variance in Estimation of Ecological Uncer-(b) Variance in Player 2’s Estimation of Player 1’s 
tainty Cost Parameter 


Fig. 4: Advanced Variance Analysis in Ecological Uncertainty and Cost Estimation 


In both Fig. 4a and Fig. 4b, there’s a notable and consistent trend observed—over time, 
the range of variation in the players’ estimates steadily narrows, eventually approaching 
zero. This trend suggests that in pollution control games, as players continually observe and 
adapt based on new information, their predictions concerning environmental challenges and 
pollution control costs become increasingly precise, moving towards a shared understanding. 

This trend can be explained by the ongoing accumulation of information by players 
during the game’s dynamic unfolding. As they engage more with the game and its evolving 
scenarios, disparities in their understanding of the ecological uncertainties and associated 
costs gradually diminish. With time, players likely develop a more nuanced grasp of the 
system’s dynamics and the actions of other players. This improved understanding leads to 
enhanced accuracy in their estimations about environmental factors and the financial aspects 
of pollution control. Such a convergence of perceptions is critical for effective decision-making 
and strategy formulation in the complex arena of pollution control. 


Springer Nature 2021 PTẸX template 


20 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


E Ae e uttue 


L i i i e utknown 
20 40 60 80 


0.6} 


Fig. 5: Comparative Charts for Common vs. Undisclosed Player Types 


Fig. 5 presents a comparative analysis of pollution control strategies in two distinct 
scenarios within the game: one where player 1’s type is common knowledge and another 
where it remains undisclosed to player 2. When player 1’s type is known to all, the pollution 
control strategies tend to be more stable. This stability stems from the complete availability 
of crucial information, with both players operating as rational and well-informed agents. 

Conversely, in the scenario where player 1’s type is not known to player 2, there’s a 
noticeable increase in the variability of pollution control strategies. In this situation, player 
2 has to depend on the signals received to deduce player 1’s type. This estimation is not 
static but evolves as the game progresses, a fact that player 1 is aware of. As a result, the 
strategies in this scenario are more prone to fluctuations. 


ote, penant creme, ° 
Paaa S aan goto 3 


aey 
. . 
es eet, peat nduga" 
E e at nn, m ' ; 
~ 20% 40 60 80 e ge 
. + 
D-a 
. u15 
pai? “s 


0.2; 


0.01 


Fig. 6: Strategy Comparisons for Different Player 1 Types 


Fig. 6 presents a strategic analysis within the pollution control game, focusing on dif- 
ferent types of player 1. It specifically compares the optimal strategies adopted by player 
1 when they are characterized by different types, either as type 2 or type 2.5. In this sce- 
nario, the level of pollution control exerted by a player is inversely related to their type. 
This means that a higher type, such as 7; = 2.5, leads player 1 to choose a lower level of 
pollution emission. This decision-making process is consistent with the interpretation that 
Tı symbolizes the cost associated with reducing pollution. When the cost of mitigating pol- 
lution is higher, player 1 is inclined to emit less pollution, indicating a greater commitment 
to pollution control efforts. 

This strategic comparison is crucial for understanding how the type of player 1, repre- 
senting their pollution abatement cost, influences their control decisions and the subsequent 
environmental impact. It demonstrates the sensitivity of control strategies to cost factors, 
underlining the significance of players’ types in determining their approaches to pollution 
control. The analysis thereby offers valuable insights into how economic considerations, 
reflected in the type of a player, drive strategic decisions in pollution control games, 
impacting both individual behavior and broader environmental outcomes. 


Springer Nature 2021 TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


0.8 } 
à ak" ee à . 
P Aeon aero) 
0.6 + 
ss, 
. 20 CO a e u2"? 
0.4} 
¢ A uzknown 
0.2 
le 
0.0 +e 


Fig. 7: Player 2’s Optimal Control: Known vs. Unknown Player 1’s Type 


Fig. 7 provides a comparative analysis of player 2’s optimal control strategies in the 
context of varying levels of information about player 1’s type in a pollution control game. 
It juxtaposes the situations where player 2 is fully informed about player 1’s type against 
scenarios where they lack this knowledge. The findings show that player 2’s control strategy 
remains stable and consistent when they are aware of player 1’s type. In this fully informed 
scenario, player 2 can make strategic decisions based on complete information, eliminating 
the need for guesswork or adjustments in their perception of player 1’s type. 

Conversely, when player 2 does not have information about player 1’s type, they must 
adapt their strategies based on the signals they receive. This lack of knowledge necessitates 
continuous updating of beliefs about player 1’s type, which in turn influences player 2’s 
approach to controlling pollution emissions. 

This comparison highlights the significant impact of information availability on player 
2’s decision-making process. When faced with uncertainty about player 1’s type, player 2’s 
strategies become more dynamic, reflecting the challenges of making decisions in environ- 
ments with incomplete information. The analysis underscores the complexity inherent in 
pollution control decisions, particularly in scenarios where players do not have full knowledge 
of each other’s strategies or costs, emphasizing the importance of information in strategic 
environmental management. 


7 Conclusion 


7.1 Summarizing Research Findings 


In this study, we investigated decision-making in the context of environmental regulation 
under asymmetric information. We explored how rational players update their beliefs, make 
strategic decisions, and adapt to evolving information. Our findings provide valuable insights 
into the dynamics of decision-making in scenarios where players have varying degrees of 
information. 


7.2 Contributions and Limitations of the Paper 


Our research contributes to the understanding of decision dynamics in asymmetric informa- 
tion settings. The key contributions include: 
1. Belief Evolution: We demonstrated that rational players can refine their beliefs over 
time, even when information is initially uncertain or asymmetric. The evolution of 
beliefs plays a crucial role in guiding player’ actions. 


Springer Nature 2021 TeX template 


22 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


2. Strategic Adaptations: We highlighted how the availability of information influences 
player’ strategies. When common knowledge is present, strategies tend to be stable, 
while undisclosed types lead to more adaptive and fluctuating strategies. 

3. Information Accumulation: The diminishing variance over time underscores the role 
of information accumulation in reducing uncertainty and improving the accuracy of 
decisions. 

However, our study also has limitations. One notable limitation arises from the assump- 
tion that the variance of the signals received by player 2 is fixed. In practice, this variance 
may not be known with certainty, which could pose significant challenges for player when 
estimating the type of player 1. In scenarios where the variance is unknown, non-convergence 
and more complex decision dynamics may result. Future research should consider these 
uncertainties to provide a more general and realistic understanding of decision dynamics in 
environmental regulation. 


7.3 Recommendations for Future Research Directions 


In future research, we aim to expand the applicability of our methods to continuous-time sce- 
narios, offering more realistic representations of dynamic decision processes over time. This 
extension will allow us to capture intricate real-world dynamics that evolve continuously, 
enhancing the model’s relevance to various dynamic systems. 

Furthermore, we intend to explore the broader utility of our parameter estimation tech- 
niques across a variety of problem domains. Our current study focuses on environmental 
regulation, but we believe that our methodology has the potential to benefit a wide range 
of decision-making contexts where players must grapple with uncertainties and information 
gaps. These could encompass fields such as economics, finance, healthcare, and more. 

By continuing to develop and apply our methodology in these directions, we aim to 
provide a robust and flexible tool set that players can utilize to enhance their understanding 
of complex systems and make more informed choices in the face of uncertainty. 


References 


1] Shoham, Y., Leyton-Brown, K.: Multiagent Systems: Algorithmic, Game-theoretic, 
and Logical Foundations. Cambridge University Press, Cambridge (2008) 


2] Kumar, P., Van Schuppen, J.: On nash equilibrium solutions in stochastic dynamic 
games. IEEE Transactions on Automatic Control 25(6), 1146-1149 (1980) 


3] Haurie, A., Krawczyk, J.B., Zaccour, G.: Games and Dynamic Games vol. 1. World 
Scientific Publishing Company, Singapore (2012) 


4| Schofield, N.: Instability of simple dynamic games. The Review of Economic Studies 
45(3), 575-594 (1978) 


5] Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal Control. John Wiley & Sons, Hoboken, 
New Jersey (2012) 


6] Zhang, C., Gholami, S., Kar, D., Sinha, A., Jain, M., Goyal, R., Tambe, M.: Keeping 
pace with criminals: An extended study of designing patrol allocation against adaptive 
opportunistic criminals. Games 7(3), 15 (2016) 


7] Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi- 
agent learning: A survey. Journal of Artificial Intelligence Research 53, 659-697 (2015) 


Springer Nature 2021 TeX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


[8] Masoudi, N., Santugini, M., Zaccour, G.: A dynamic game of emissions pollution with 
uncertainty and learning. Environmental and Resource Economics 64, 349-372 (2016) 


[9] Koulovatianos, C., Mirman, L.J., Santugini, M.: Optimal growth and uncertainty: 
Learning. Journal of Economic Theory 144(1), 280-295 (2009) 


10] Mirman, L.J., Santugini, M.: Learning and technological progress in dynamic games. 
Dynamic Games and Applications 4, 58-72 (2014) 


11] Prokhorov, Y.V.: Convergence of random processes and limit theorems in probability 
theory. Theory of Probability & Its Applications 1(2), 157-214 (1956) 


12] Zamir, S.: Bayesian Games: Games with Incomplete Information. Springer, ??? (2020) 


13] Huang, L., Zhu, Q.: Analysis and computation of adaptive defense strategies against 
advanced persistent threats for cyber-physical systems. In: Decision and Game Theory 
for Security: 9th International Conference, GameSec 2018, Seattle, WA, USA, October 
29-31, 2018, Proceedings 9, pp. 205-226 (2018). Springer 


[14] Welch, G., Bishop, G., et al.: An introduction to the kalman filter (1995) 


[15] Breton, M., Zaccour, G., Zahaf, M.: A differential game of joint implementation of 
environmental projects. Automatica 41(10), 1737-1749 (2005) 


[16] Elsawah, S., Pierce, S.A., Hamilton, S.H., Van Delden, H., Haase, D., Elmahdi, A., 
Jakeman, A.J.: An overview of the system dynamics process for integrated modelling 
of socio-ecological systems: Lessons on good modelling practice from five case studies. 
Environmental Modelling & Software 93, 127-145 (2017) 


17] Qin, M., Sun, M., Li, J.: Impact of environmental regulation policy on ecological effi- 
ciency in four major urban agglomerations in eastern china. Ecological Indicators 130, 
108002 (2021) 


18] Ummenhofer, C.C., Meehl, G.A.: Extreme weather and climate events with ecologi- 
cal relevance: a review. Philosophical Transactions of the Royal Society B: Biological 
Sciences 372(1723), 20160135 (2017) 


19] Belkaoui, A.: The impact of the disclosure of the environmental effects of organizational 
behavior on the market. Financial management, 26-31 (1976) 


20] Stavins, R.N.: Transaction costs and tradeable permits. Journal of environmental 
economics and management 29(2), 133-148 (1995) 


21] Shokri, R.: Privacy games: Optimal user-centric data obfuscation. arXiv preprint 
arXiv:1402.3426 (2014) 


22] Lopez, V.G., Wan, Y., Lewis, F.L.: Bayesian graphical games for synchronization in 
networks of dynamical systems. IEEE Transactions on Control of Network Systems 


7(2), 1028-1039 (2019) 


23] Ieong, S., Shoham, Y.: Bayesian coalitional games. In: AAAI, pp. 95-100 (2008) 


24] Bingham, N.H.: The work of an komogorov on strong limit theorems. Theory of 
Probability & Its Applications 34(1), 129-139 (1990) 


Springer Nature 2021 PTẸX template 


24 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


[25] Meinhold, R.J., Singpurwalla, N.D.: Understanding the kalman filter. The American 
Statistician 37(2), 123-127 (1983) 


A Proof of Theorem 3 


In our case, Xo, X1, X2, ... is an infinite sequence of independent and identically distributed 
(i.i.d.) Lebesgue integrable random variables. As a result, we can obtain 


K 1 
M, = 20 Cpe ee) 
ko+t Kot+t zi 
0 komo 7 1 Xo+Xit +X (21) 
ko+t 241 t ` 


According to the law of large numbers, 


.  Xot+tXı+- +X- 
lim = 


t= oo t 


We obtain the limit of (21) with respect to t, then 
lim Me =O0+p=up. 
too 
This theorem implies that the final resultant estimate pz will be nearly in p. 


B Proof of Theorem 5 


If we rewrite the iterative equation (8) for B,, H; in terms of the initial values of the beliefs 
Ko, Ho, Bo, and the historical signals, we can derive that 


Bi =m SS (‘eat min = Ma) 


a 2. (ko+m+1) 
E 1 
=% +5 (a Dm — Mra)?) (22) 
5 Ko +m 
i 1 
=Bo + ; (a \( X2, — 2XmMm + m2) ; 
m=0 KO ue 
where i 
Mm = z; (roto + Xo + Xi + Xa +-+ Xma). (23) 


First, we will prove that 


jim, 2 is finite. 

We divide the proof into several parts to manage the complexity of the expression for B4. We 

aim to show that each component of Be converges to a finite limit as t approaches infinity. 
Step 1: We want to demonstrate that 


Springer Nature 2021 TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


This is straightforward because {po is a positive, finite constant. As t grows without bound, 
the denominator of the fraction Bo. becomes arbitrarily large, while the numerator remains 
unchanged. According to the standard properties of limits in the real numbers, a constant 
numerator divided by an unbounded increasing denominator will converge to zero. Therefore, 
we conclude that 


Step 2: We consider the sum of squared observations and wish to show that 


1 to 

lim = 2 

-m 
m=0 


is finite. Since the sequence of random variables Xj, X2,... are independent and 
identically distributed with E(Xm)—= and D(Xm)=0?, we can deduce that 
E(X2,) = D(Xm) + [E(Xm)]? = 0? + p2. 

Invoking the Strong Law of Large Numbers, the average of the squared observations 
converges almost surely to E(X2,) as t approaches infinity. Hence, we have: 


t-1 
2 _ 2 2 2 
pes d Xm = E(X;,) o +p ’ 


which is indeed a finite value, thereby establishing the existence and finiteness of the limit. 
Step 3: We aim to show that the limit 


1 1 
lim =- —— 
tooo t > Ko +m+1 un 


m=0 


is finite. We apply Proposition 4 adapted to our context of independent random variables 
with convergent series of variances and means. 


For any fixed positive integer t>1, consider the random variables 
Zh = ttnt Xin: Since ZQ) here is not a sequence of identically distributed random 
variables, we cannot use the Law of Large Numbers as in the aforementioned proof; hence, 


we resort to Proposition 4 to prove the convergence of the series sum of this random 
variable. From proposition 4, we establish to prove the following for z9.: 
1. The series limt— co DE D(z) is convergent as t approaches infinity. 
2. The series limz_-+00 Tei E(Z®) < o. 
Given E(X?,) =p? +0? and D(X2,) = 204 + 4u?0o? (assuming Xm is normally dis- 
tributed), we have: 
For E(Z\): 


u +o? 
(ko +m+1)’ 


1 
E(X?2,) = 
t(ko+m +1) (Xm) t 


E(Zm) = 


which is finite for all m and t since u and ø are finite and t(ko + m + 1) increases without 
bound as m increases. 
Consider the infinite series of expected values given by: 


> E(ZY). 


3 
ll 
° 


Springer Nature 2021 PTẸX template 


26 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


To prove convergence, we apply the ratio test. We compute the ratio of successive terms 
of the series as follows: 


EIZE?) E 1 
E(Z®) ko t+m+2’ 


where kg > 0 and m > 0. It follows from the expression that this ratio is strictly less than 1 
for all m. 
Since the ratio is always less than 1, we can conclude by the ratio test that the series 


D EZG) 


m=0 


is convergent. 
For D(Z)): 


1 7 204 + Aya? 


D(ZW) = = 
( ) t2? (ko +m-+1)2 (Xm) t2(K9 + m+ 1)? 


which is also finite for all m and t, and decreases as m grows due to the m? growth rate of 
the denominator. 

In order to prove the convergence of the series sum } po D(z), we employ the 
comparison test. Upon evaluation, we find that the ratio of successive terms is given by 


1 

DG ( 1 T 
D( Zi ko +m+2 ` 

Given that ko > 0 and m > 0, it follows that this ratio is strictly less than 1. Thus, 


according to the comparison test, the series } 7o D(z) is convergent. 
In conclusion, the series of variances and means converge, the proposition 4 implies that 
the limit 


t—1 1 
lim = —> 
too t Š ted m 


is almost surely finite. 
Step 4: We seek to prove that the cross-product term 


t—1 
1 
lim = X` XmMm (24) 
m=0 


too t 


is finite. Utilizing Proposition 4, for any given t > 1, we define a new sequence of random 
i 2 Ks. as 
variables Z2) = Erot? where Sl, = Kkouo + Sa Xi. 


By computation, the expectation of Z2) is obtained as 


2 H(KkoHo + mp 
BEZa) = we +m) 


It is straightforward to demonstrate that this expectation is finite when m and t are 


finite numbers. As t or m approaches infinity, the value of this expectation tends toward zero. 


Consequently, regardless of the values of m and t, the expectation of VA is always finite. 


Springer Nature 2021 4TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


To establish the convergence of limt—+00 yee B(Z2), we first simplify the expression 
for E(Z®). For this, we define a sequence {ai} by 


(2) _ (koyo + mp) 
amn = ———. 
Ko +m 


Observing the sequence am, we find that: 
li =j, 
Pme am k 


Applying the Cesàro mean theorem yields: 


t—1 t—1 
„n I : (2) 2 
eer De ne Se 
m=0 m=0 
Hence, we have proved that the series of means is convergent. 
Next, we shall prove the second statement of Proposition 4: For any m > 0 and t > 0, 


the variance of zo, denoted by D(z), is finite. By computation, we find that: 


(m(o? + u?) + (kopo + mp)?) 
t?(ko + m)? 


2 
D(z®)=7 


, 


where ø, u, ko, and po are all non-negative constants, t > 0 and m > 0. 


ei D(z), we first aim to simplify the 


To establish the convergence of limt—oo > aaa 


expression for D(z). For this purpose, we introduce a sequence {o2)} defined by 


o? (m (0? + p?) + (kono + mu)?) 


52) = 
m 2 
(ko +m) 


It is important to note that {o2)} is a numerical sequence and not a sequence of random 


variables. This distinction allows us to express D(z?) as 


(2) 
D(z) = m, 


Finally, we proceed to demonstrate that the limit 


EANO 
lim — 
t= oo t2 
ae 


is convergent. This is the crucial step in ensuring that the cross-product term under 
consideration is finite. 
Given the sequence bm, we observe that: 
lim bm =07p?. 
m—- co m 2 H 


By the Cesàro mean theorem: 


t—1 

1 

lim — 5 bm = op < oO. 
m=0 


too t 


Springer Nature 2021 PTẸX template 


28 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


Since limt—+oo + = 0, we have: 
t-1 1 , tol 
Pan +2 Dm bm = (im. *) i (ix t on) 
=0 m=0 
=0-07y? =0 
Therefore, it can be concluded that: 
t-1 
lim D(Z2)) < œ 
t— oo 
m=0 


Upon demonstrating that the series comprising the expected values and the variances 
of the sequence of random variables converge to finite sums, we infer, in accordance with 
Proposition 4, that the summation of random variables as delineated by Equation (24) is 
almost surely convergent to a finite limit. 

Step 5: For the weighted cross-product term, show that 
liM oo eee Kpn mM is finite. 


We define the random variables zo by the relation: 


XmS'n, 


3) — 
t(ko + m)(ko + m+ 1)’ 


(25) 


where X and S/, maintain their previous definitions. The expectation of ARA denoted by 


E(Z®)), is calculated as: 


my? + kopoy 
t(ko + m)(ko +m +1) 


E(Z®) = (26) 


This quantity is demonstrably finite for all non-zero, finite values of m and t, and approaches 
zero as either m or t approaches infinity, indicating that it remains finite across the domain 
of m and t. 

In the process of proving the convergence of the series of expected values for the sequence 


of random variables 22), namely lim4— oo + Yra E(Z®)), we proceed by defining a novel 
sequence 

B) my? + Kopo 

m 


(ko +m)(ko +m +1)’ 
which converges to zero. Utilizing the Cesàro mean theorem, we can then deduce that 
limt— oo DDE EZ) = 0, thereby affirming the convergence of the series of expected 
values. 

In assessing the variance, denoted as D(z), we find: 


a? [(o? + u?)m + (kono + mu)? 


(3) 
D(Z, = 27 
(Zr) t2 (ko +m + 1)? (ko + m)? (27) 
To streamline our expression, we introduce a sequence o(3): 
TORS a? [(o? + p?)m + (kono + my)?] (28) 
m 3 


(ko +m + 1)? (ko + m)? 


(3) 
thereby reducing D(z) to aoa 


Springer Nature 2021 TeX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


To prove that lim¢—oo D D(z) is finite, consider the following: 


First, we know that the sequence AS converges to o?u? as m —> œ, which means for 
any arbitrarily small positive number e, there exists a positive integer N such that for all 


m>QN, fosy — 07 p?| <e. 
Utilizing the Cesaro mean, we have: 


Further, we can express the limit of D(z) as: 


jim, » D(zZ®)) = jim, LS 
t-1 (3) 


Since lim¢+oo 1/t = 0, we may iat the limit of > and the sum DISI m sepa- 
rately. By the limit multiplication law, if two independent limits exist, then the limit of their 
product is the product of their limits. Therefore, we have: 


La 


SE 
jim, > D (z®) = jim, 1/t x jim, = bm 


p3) 


2m- = g? 4,7, the 


Lastly, since limt+oo 1/t = 0 and by the Cesaro mean limz-+00 yee Ei 5 


result is 0 x o?u? = 0, indicating the limit is finite. 

Step 6: Study the sum of the squared means and prove that lim{—oo A DDE M2, is 
finite. ae 

Consider the newly defined random variable Zh) = qo Its expectation and 


variance are given by: 


mo? + (kopo + mp)? 


ah 
E(Zm’) eee (29) 
D(z) = uy [2(mo2)* + 4(mo?)? (kono + mp)?] . (30) 


It can be demonstrated that for all finite values of m and t, both E(Z®) and D(z) 
are finite. Furthermore, as m or t approach infinity, the expressions tend toward zero due to 
the quadratic and quartic terms in the denominator. This ensures that the expectation and 
variance remain bounded for all values of m and t. 


ZA) and limo J6 1, D(Z ZA) is 


finite using the methodology outlined above. By the definition of z% and the expressions 
for its expectation and variance, as t approaches infinity, each term in the sum of variances 


Similarly, we can prove that limt—sco iam 1, E(Z 


D(z) diminishes due to the increasing powers of t and m in the denominators. Therefore, 
by applying Cesaro’s mean theorem and the properties of limits, we establish the finiteness 
of the limit. 

SEP 7: Finally, for the weighted sum of squared means, show that 
limo + 7 Sins S mim Min is finite. 

We consider a newly defined random variable 


S! 2 
m 
t(ko +m)? (ko +m +1)’ 


z® — 


for which we can determine the expectation to be 


mo? + (koo + mp)? 
t(ko + m)? (ko +m+ 1) : 


EZ. = 


Springer Nature 2021 PTẸX template 


30 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


and the variance to be 


1 
t? (ko + m)4(ko +m + 1)? 


DIZy | = [2(mo?)4 + 4(mo?)? (kopo + mp)? | : 


Utilizing similar reasoning as before, it can be shown that both the expectation and variance 
of ge ) are finite. Moreover, by applying the principles of Cesaro means and limit properties, 


we can demonstrate that 
t-1 
lim D(z) 
t= oo 
m=0 
and 
t-1 
lim B(Z) 
t— oo 
m=0 
are also finite. 


Combining the results from Steps 1 to 7, we conclude that each term contributes a finite 


amount to Be in the limit as t > ov, and thus limt—yoo Be is finite. 
Having established that 
. Bi 
lim — 
tooo t 


is finite, it becomes apparent that we must prove that the numerator of the fraction in 
question remains bounded. It is straightforward to observe that the denominator 


Rt (Gt — 1) 
t 


grows without bound as t > œo, or equivalently, that the reciprocal of the denominator 
approaches zero. Therefore, we can deduce that the limit of the fraction in Eq. (9) approaches 
zero, as required to be shown. 


C Proof of Theorem 6 


Based on the derivation we’ve discussed in Eq. (13) for the prior variance P(t + 1|t) at time 
t+ 1, which is related to the posterior belief P(t) at time t, through iterative calculations, 
we can arrive at the following expression: 


PR; ; 
Pi(t) = PPOFR t=0,1,2,..,7’, Vie N. (31) 
a 


It becomes apparent that as t approaches infinity, the behavior of P;(t) does not depend on 
the specific value of R;. Instead, it asymptotically converges to zero. This asymptotic con- 
vergence implies that the prior variance P;(t) tends toward zero, representing an increasingly 
precise and stable estimate of the unknown parameter as time progresses indefinitely. This 
convergence is regardless of the initial state covariance Pe? and the measurement uncertainty 
Ri, highlighting the robustness and accuracy of the estimation process in the long term. 


D Proof of Theorem 7 


We begin with the iterative Eq. (15): 


Springer Nature 2021 4TRX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


t 
-Ú a” 


2 P;(m—1) 


er: a) (a) im, 


where P;(h) is variance of estimation as to 7; at time h, R is the variance of signals, Y;(m) 
is the signal received at time m. 
For ease of computation, we split the equation into two parts to address separately. 
First, calculate the product: 


(32) 


t t t 
A Ri hP? + Ri — P? Ri 
ie a eee ee 

hEr h=1 Ri +4 -IPO FR; h=1 i i i 
Similarly: 
Il Ri _ Ri+P}m (34) 
jon eA 1) Ri + Pet 
With these products calculated, we proceed to compute: 
Pi(m-1 P? 
umen a a (35) 
Substituting these (33), (34), and (35) results into the original equation (32): 
A Ri o po 
T(t j Yi(m 
a ) R; a Pot Ti Ri ES Pt Ey i j 
36 
2 i „o, Pit [=e i 


Utilizing the law of large numbers, as t tends to infinity, according to the observation 
equation (11), we conclude that 


Therefore, simplifying the Eq. (36): 


T;(t) > TF as t > œ. (37) 
This demonstrates that the estimated random variable Î; (t) tends towards the true value 
Tř as time progresses. 


E Proof of Proposition 8 


To prove this, keep in mind that each type of player must play a best response. 


Springer Nature 2021 TeX template 


32 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


Denote by Vi(S,t,71) the value function of player 1. The Hamilton—Jacobi—Bellman 
(HJB) equation of player 1 is given by 


Vi(S,t,71) = max {utir (a — wa (t,71,+) — uh (t,-)) — 1S 
u1(t,71,°) (38) 
+ Vi (E(u (t, Ti) + ud(t,-) + 8S), t+ 1,71) }. 


Considering the linear-state structure of our model, we conjecture that the value function 
is linear and specified as follows: 


Vi (S, t, T1) = Ar(t,71)S + Bı (t, T1). (39) 


Plugging the conjectured value function in (38), we obtain 


Ai(t,71)$ + Bi(t,71) = max {uit ni Yar = ltr) = ult) = 8 
ul Tlo’ 
(40) 
+ A(t +1, Ti )Ze(u (t, T1, +) + ud(t,-) +65) + Bilt 4 1,71) }. 


The first-order equilibrium condition is given by 


a, — ua (t, -) + A(t + 1, 71)%t 
5 . 


uj (t,71,°) = (41) 


From (41), we assert that ui is independent of state S. To find the coefficients of the value 
function, we substitute for u¥(t) in (40) and equate the coefficients in order of S, since 
VW(S,T’ +1,7;) = 0, that is, 


Aı(t, T1) =- + Aı(t + 1, 71)%6, 


1 (42) 
A(T + 1,71) © 0, 
we get 
1 — (z,6)T'+1-t 
airj s 28) (43) 
1 — zô 
a eT! — 
To simplify notation, we define a function c1(t) = — =o) = so the function of A, (t + 
1,71) can be simplified to 
Ailt +1,71) = Tıc (t). (44) 


Remark: A,(t) depends on the player’s prior estimate z+ of uncertainties in the motion 
equation at time t. When calculating Aı (t+1), we do not replace % with T++1, because doing 
so would imply that the player’s control at time t depends on their estimate at time t + 1. 
This contradicts the fact that the player does not receive the signal x; until after making 
a control decision. In other words, when the player considers a subgame starting at time t, 
the signals they receive at that moment only include those preceding time t (excluding t). 
Therefore, within this subgame, their estimation of unknown parameters remains constant 
over time, as long as no new signals have been received. 

Therefore, we can rewrite 


1 t)T 
a e + WOE, (45) 


uï (t, 71, -) S 


Springer Nature 2021 TeX template 


Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dynamics a 


To solve for u3(t,-), we write Hamilton—Jacobi-Bellman (HJB) equation of player 2 


V2(S,t) = 
Pare { my [(wa(t,-)(a2 — u2(t,-) — ut (t,71,-)) — b2S) - pb (ri lyi (t — 1))] m 
+ JO [Vo@ (uf (t, Ti) + ua(t,-) +85), t+ 1) pillu (t— wl}. 

TET) 


Again plugging the conjectured value function in (47), we can get 


Ag(t)S + Ba(t) 
= max 4 D> [(wa(t,-)(a2 — ua(t,-) — ui (t,71,-)) — b28) - p5 (milut — 1))] 
u2(t,:) m1€ET, 
+ JO Alt +1) -Te - (uf (t, Ts) + walt.) + 68) + Balt + 1)) phlri lt- a). 
T1ET 


(47) 


From (45), it can be seen that uj(t,71,-) is a linear function of 7;, so we can simplify 
uj (t,71,-) = fi (t) + f2(t)ri. Therefore, we can simplify (47) as 


Ao(t)S + Ba(t) = max {ua(t, *)(a2 — ua(t,-) — fi(t)) — b29 


ug(t,-) 
+ Ao(t + 1): (fi(t) + wa(t,-) + 6S) + Bo(t +1) (48) 
+ (Aa(t + DEP) — ua(t,-)fo()) D> ri halt- Dy}. 

m1 ET, 


According to the definition of mathematical expectation and Bayesian filter, we can get 


Do [rn lult- 1))] = 716). (49) 


m1ET) 


Therefore, 


A2(t)S + Ba(t) = mat {ua (t,-)(a2 — ua(t,-) — fr(t)) — b28 — ualt,-)fo(t)Fi(t)+ 
oer (50) 
+ Aalt + 1)Z: (fi (t) + fo(t)Fi(t) + ua(t,-) + 8S) + Bo(t + D} 


The first-order equilibrium condition is given by 


az — fi (t) — fa(t)Ti(t) + A2(t + 1)T: 


ug (t, -) a 3 ` (51) 
where fı(t) = E — sus (t,-), fo(t) = awr, 
Rewrite it as 
2 1 ci (tT 2c1 (t)z 
u(t; Ta (t), T) = Șa2 -30 Mo ~71(t) 4 W “bo, (52) 


E 1—(z46)T+1—t 


where cı (t) = E 


Springer Nature 2021 TeX template 


34 Dynamic Bayesian Updating and Theoretical Validation under Uncertain State Dyni 


Substituting (52) into (45), we can get the optimal strategy of player 1 as 


C1 (t)Tt fa (t)z — 
TL 
2 6 3 


= ae 2 
uz (tm; 71(), Te) = zar — 302 4 
The above calculation is based on the fact player 2’s belief 71(t) is public information. 


F Proof of Theorem 9 


Given that the limits lim:. A; = p and limt+o071(t) = T have been previously estab- 
lished in Thm. 3, Thm. 7, the convergence of the control strategies can be inferred based on 
the following steps: 
1. Convergence of p, and 71(t): Since p, and 71(t) converge to u and Tř respectively, 
these steady-state values can be substituted into the optimal control strategies. 
2. Behavior of ci(t): As t tends to infinity, the term (T - 8T t approaches zero, thus 


cı (t) converges to cy = — a finite constant. 


Tei’ 

3. Substitution into Control Strategies: Replacing p, with u, 71(t) with r and cı (t) with 

cı in the expressions for uj and u5 leads to the elimination of time-dependence in the 
control strategies. 

4. Resulting Convergence: This substitution results in uj and u5 converging to con- 
stant values that are functions of u and rÏ, thus eliminating the time-dependence and 
achieving convergence. 

Therefore, as t approaches infinity, the control strategies uł (t, Tı; T1(t),@) and 
u(t; T1(t), TŁ) converge to their respective steady-state values, reflecting the alignment of 
players’ strategies with the true state of the system. This convergence is a direct consequence 
of the convergence of their estimates of ecological uncertainty and cost parameters. 


